Prosody and phonetic variability: Lessons learned from acoustic model clustering
نویسندگان
چکیده
Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled conversational speech data using acoustic parameters and clustering techniques that are standard in speech recognition. We find acoustic differences primarily associated with segment position at prosodic constituent onsets and at prominent syllables. Importantly, phones at fluent vs. disfluent boundaries are frequently placed in different clusters. These differences can be leveraged in a “multiple pronunciation” acoustic model to aid in detecting fluent vs. disfluent prosodic boundaries, and potentially for improving recognition accuracy.
منابع مشابه
Distributional Learning of Vowel Categories Is Supported by Prosody in Infant-Directed Speech
Infants’ acquisition of phonetic categories involves a distributional learning mechanism that operates on acoustic dimensions of the input. However, natural infant-directed speech shows large degrees of phonetic variability, and the resulting overlap between categories suggests that category learning based on distributional clustering may not be feasible without constraints on the learning proc...
متن کاملAn Acoustic Study of Emotivity-Prosody Interface in Persian Speech Using the Tilt Model
This paper aims to explore some acoustic properties (i.e. duration and pitch amplitude of speech) associated with three different emotions: anger, sadness and joy against neutrality as a reference point, all being intentionally expressed by six Persian speakers. The primary purpose of this study is to find out if there is any correspondence between the given emotions and prosody patterning in P...
متن کاملPhonetic and speaker variations in automatic emotion classification
The speech signal contains information that characterises the speaker and the phonetic content, together with the emotion being expressed. This paper looks at the effect of this speakerand phoneme-specific information on speech-based automatic emotion classification. The performances of a classification system using established acoustic and prosodic features for different phonemes are compared,...
متن کاملAcquisition of prosody: The role of variability*
Although some phonetic variability is inevitable in speech production, adult speech is fairly consistent. Thus, part of becoming a competent adult speaker is learning to appropriately limit the variability in one’s speech. It is generally believed that phonology is mastered relatively early; however, this does not take into account the refinement of articulation required to reign in the variabi...
متن کاملProsody-dependent Acoustic Modeling for Mandarin Speech Recognition
A study on introducing prosodic information to acoustic modeling (AM) for speech recognition is reported in this paper. It extends the conventional context-dependent (CD) triphone HMM modeling approach to further consider the dependency of phone model on the break type of nearby inter-syllable boundary. Four break types are considered, including major break, minor break, normal non-break, and t...
متن کامل